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TRANSFER ENTROPY ON RANK VECTORS 
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Abstract. Transfer entropy (TE) is a popular measure of infor- 
mation flow found to perform consistently well in different settings. 
Symbolic transfer entropy (STE) is defined similarly to TE but on 
the ranks of the components of the reconstructed vectors rather 
than the reconstructed vectors themselves. First, we correct STE 
by forming the ranks for the future samples of the response sys- 
tem with regard to the current reconstructed vector. We give the 
grounds for this modified version of STE, which we call Transfer 
Entropy on Rank Vectors (TERV). Then we propose to use more 
than one step ahead in the formation of the future of the response 
in order to capture the information flow from the driving system 
over a longer time horizon. To assess the performance of STE, 
TE and TERV in detecting correctly the information flow we use 
receiver operating characteristic (ROC) curves formed by the mea- 
sure values in the two coupling directions computed on a number of 
realizations of known weakly coupled systems. We also consider dif- 
ferent settings of state space reconstruction, time series length and 
observational noise. The results show that TERV indeed improves 
STE and in some cases performs better than TE, particularly in 
the presence of noise, but overall TE gives more consistent results. 
The use of multiple steps ahead improves the accuracy of TE and 
TERV. 

Keywords, bivariate time scries, coupling, causality, information 
measures, transfer entropy, rank vectors. 

1 Introduction 

The fundamental concept for the dependence of one vari- 
able Y measured over time on another variable X mea- 
sured synchronously is the Granger causality [I]. While 
Granger defined the direction of interaction in terms of 
the contribution of X in predicting Y , many variations 
of this concept have been developed, starting with linear 
approaches in the time and frequency domain (e.g. see 
[3J[3]) and extending to nonlinear approaches focusing on 
phase or event synchronization [4] [5] [6] , comparing neigh- 
borhoods of the reconstructed points from the two time 
series [7J 13 EJ EH HU 021 H3] , and measuring the informa- 
tion flow between the time series [1"4 ] IT5 ] [T6 ] \Tf \ [18 ] . 

Among the different proposed measures we concentrate 
here on the last class of measures, and particularly on the 
transfer entropy (TE) [2] and the most recent variant 
of TE operating on rank vectors, called symbolic trans- 
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fer entropy (STE) [17] (see also [IH| for a similar mea- 
sure) . There have been a number of comparative studies 
on information flow measures and other coupling mea- 
sures giving varying results. In all the studies where TE 
was considered, it performed at least as good as the other 
measures [20] [2TJ [22] . The STE measure is proposed as 
an improvement of TE in real world applications, where 
noise may mask details of the fine structure, that can be 
better treated by coarse discretization using ranks instead 
of samples. 

We propose here a correction of STE. In the definition 
of TE the observable of the response at one time step 
ahead is a scalar, but in STE it is taken as a rank vec- 
tor at this time index. We modify STE to conform with 
the definition of TE and give grounds for the correctness 
of this modification. Further, we allow for the future of 
the response to be defined over more than one time steps. 
To the best of our knowledge this has not been imple- 
mented in TE, but it is the core element in the informa- 
tion flow measures of mean conditional mutual informa- 
tion |18] and coarse-grained transinformation rate |15] . In 
many applications on interacting flow systems, the sam- 
pling time may be small and a single step ahead may not 
regard the time of the response to the directed coupling, 
as in electroencephalography (EEG) [TjJ [23] , and in the 
analysis of financial indices [16] [24] . The same may hold 
for maps: the transfer of information may better be seen 
over more than one iteration of the interacting maps. We 
compare TE, STE and our correction of STE on mea- 
suring weak directed interaction in some known coupled 
systems and for different state space reconstructions, time 
series lengths and also in the presence of noise. We also 
investigate the change in the performance of these mea- 
sures when defining them for more than one step ahead. 

In the following, TE and STE measures are presented 
briefly in Section [2] and the proposed modification of TE 
is described in Section [3] Then the results of the simu- 
lation study comparing the proposed measure to TE and 
STE are presented in Section 2] and discussed in Sec- 
tion [5] 

2 Information flow measures 

Let us suppose that a representative quantity of sys- 
tem X is measured giving a scalar time series {xt}^Li 
and that we have respectively {yt}tLi for Y, where 
X and Y possibly interact. Using the method of de- 
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lays the reconstructed points from the two time se- 
ries are x t = [xuXt-T*,---,Xt-(m x -i)T x ] and Yt = 
[yt,yt-r y , ■ ■ ■ ,y t ~( mv -i)r v }, allowing different delay pa- 
rameters t x , T y and embedding dimensions m Xl m y for 
the systems X and Y, respectively. 

Transfer entropy (TE) is a measure of the information 
flow from the driving system to the response system. 
Specifically, TE estimates the entropy in the response 
system caused by its connection to the driving system, 
accounting for the entropy generated internally in the re- 
sponse system [TJ]. TE for the causal effect of system 
X on system Y can be defined in terms of the Shannon 
entropy H(x) = ^2p{x)\ogp{x) as 

TE X ^y = (1) 

-H(y t+ i,x t , y t ) -I- H(x t) y t ) + H(y t+1 ,y t ) ~ H(y t ), 
or directly in terms of distribution functions as 



TEx^y = ^p(y f+ i,x f ,y t )log 



p(yt+i\x.t,yt) 
p(yt+i\yt) 



(2) 



where p(yt+i,Xt,y t ), _p(y t+ i|x t ,y t ), and p{y t +i\yt) are 
the joint and conditional probability mass functions 
(pmf). The summation is over all the cells of a suit- 
able partition of the joint variable vectors appearing as 
arguments in the pmfs or entropy terms. 

The estimation of TE requires the estimation of the 
pmfs in eq.©, or the probability density functions as- 
suming the integral form and no binning. The pmfs are 
estimated directly by the relative frequency of occurrence 
of points in each cell, but finding a suitable binning may 
be challenging [2S] [55]. Moreover, for high-dimensional 
reconstructions, the binning estimators are data demand- 
ing. Therefore estimators of the probability density func- 
tions are more appropriate for TE estimation, such as 
kernels [37], nearest neighbors [35], and correlation sums 
|29j . We follow the latter approach to estimate TE and 
recall first that without assuming discretization each term 
of the form -ff(x) in ([TJ expresses the differential en- 
tropy of the vector variable x. The differential entropy 
can be approximated from the correlation sum C(x) as 
-ff(x) ~ lnC(x) +m\nr, where C(x) is the estimated cu- 
mulative density of inter-point distances at embedding di- 
mension m and for a suitably small distance r |29j . Thus 
TE is estimated by the correlation sums as 



TE 



x->y 



log 



C(y t+ i,x.t,yt)C(y t ) 
C(x t ,y t )C(y f+ i,y t )' 



(3) 



where C(y t+ i,x t ,y t ), C(y t ), C(x t ,y t ) and C(y t +i,y t ) 
are the correlation sums for the points of the form 
[y t+ i,x t ,y t ], y u [x t ,y t ] and [yt+i,y t ], respectively. 
The corresponding vector dimensions are 1 + m x + 
m y , rriy , m x + m v and 1 + m y . To account for the different 
dimensions, we use the standardized Euclidean norm for 
the distances. 

The so-called symbolic transfer entropy (STE) is de- 
rived as the transfer entropy defined on rank vectors 



formed by the reconstructed points [17) . For each point 
yt, the ranks of its components in ascending order as- 
sign a rank vector y t — [r% , r^-, . • . , r my } , where rj £ 
{1, 2, . . . , m y } for j = 1, . . . , m y , is the rank order of the 
component y t _tj_ 1 \ T (for two equal components of y t 
the smallest rank is assigned to the component appearing 
first in y t ). Substituting also yt+i in eq.([T]) with the rank 
vector at time t + 1, yt+i, STE is defined as 

STE X ^ Y = (4) 

-H(y t+ i,x u y t ) + H(± t) y t ) + H(y t+1 ,y t ) - H(y t ). 

The estimation of STE from eq. (j4| is straightforward as 
the pmfs are naturally defined on the rank vectors. There 
is a great advantage of using a rank vector y t over a bin- 
ning of yt, say using b bins for each component: the pos- 
sible vectors from binning are b my while the possible com- 
binations of the rank vectors are m y \. For example, for 
b — m y — 4, there are 256 cells from binning and only 24 
combinations of rank vectors. Still, the estimation of the 
probability of occurrence of a rank vector becomes unsta- 
ble as the dimension increases. Especially, for the joint 
vector of ranks [yt+i,Xt,yt] the dimension is 1m y +m x , 
for which the equivalent of TE is [yt+i,xt,yt] and has 
dimension 1 + m T + m v . 



3 Modification of symbolic trans- 
fer entropy 

The conversion of the scalar yt+i to the rank vector yt+i 
seems to have been chosen in order to express yt+i in 
terms of ranks in [T7] . Under this conversion, STE is not 
the direct analogue to TE using ranks instead of samples. 
The problem is not so much the use of the scalar yt+i 
or the vector yt+i in the definition of TE in eq.([l]) or 
cq.© because for r v = 1 p(y t+ i,x tl y t ) =p(y i+ i,x t ,y t ), 
as all components but yt+i of the vector yt+i are also 
components of y t . The same holds for the conditional 
pmfs in eq.([2]) and the two correlation sums in which yt+i 
appears in eq.Q. We elaborate on the implication of the 
use of yt+i below. 

Let us first assume that t v = 1. A first problem lies 
in the fact that when deriving the rank vector yt+i as- 
sociated with yt+i, the rank of the last component of 
yt, Vt-m +ij is not considered. As an example, consider 
the vector y t = [y u y t -i, yt— 2, Vt-z\' with a correspond- 
ing rank vector y t = [1,2,3,4], i.e. the samples decrease 
with time. If the decrease continues at the next time 
step then yt+i = [1,2,3,4], if yt+i is between y t and 
yt-i then yt+i = [2,1,3,4], if it is between yt-i and 
y t -2 then y 4+1 = [3,1,2,4], and finally if y t+1 is larger 
than yt-2 (the largest of all components in yt+i) then 
yt+i = [4,1,2,3]. The 4 possible scenarios are shown in 
Fig. Q] The definition of rank vector yt+i accounts only 
for the possible rank positions of yt+i with respect to the 
last m y — 1 samples, ignoring the sample yt- m +i, here 
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Figure 1: Sketch of a position of samples 

j/t_3, t/t-2) Vt-iiVt and the possible rank position of 
Vt+i together with the corresponding rank vector yt+i 
defined for STE and the actual rank of yt+i considering 
all 5 samples. 

Vt-3- With regard to the same example, yt+i = [4, 1, 2, 3] 
assigns to both cases y t -2 < yt+i < yt-z and Vt-z < Vt+i 
(see Fig. [1]). In the entropy or probability terms of the 
definition of TE, yt+i appears together with y t , and there 
are 5 possible rank positions of yt+i in the augmented vec- 
tor \yt+i,yt,yt-i,yt-2,yt-3], as shown in Fig. [TJ Thus for 
m y = 4 there are 5! = 120 different rank orders for the 
joint vector [j/t+i,yt], but when forming the joint rank 
vector [yt+i,yt] (as in the computation of STE) there 
are only 4! ■ (4!/3!) = 96 possible rank orders. In gen- 
eral, there are (m y + 1)! possible rank orders for the joint 
vector [j/t+i, yt], but STE estimation represents them in 

m v- • ( W "-i)! rank orders of [y*+i>y*]- 

The pmf of the rank vector derived from [yt+i,Yt] and 
the pmf of the rank vector [yt+i, yt] are shown in Fig. [2] 
for uniform white noise data and m y — 3. There are 
{ply + 1)! = 24 cquiprobable rank orders for [yt+i,yt] 
(see Fig. [2^) but only m y \ ■ ."'^w = 18 different vectors 
[yt+i,yt] are found, where m y \ = 6 of them have about 
double probability, each corresponding to two distinct 
rank orders that could not be distinguished (Fig. ®>). 
This results in the underestimation of the Shannon en- 
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Figure 2: (a) Estimated pmf for the ranks of [j/t+i,yt] 
with m y — 3 (probabilities are in ascending order), where 
the samples yt are from a uniform white noise time series 
of length N — 10 16 . (b) Same as in (a) but for the rank 
vector [yt+i,y t ]. 

tropy. Using N = 10 16 samples and the ranks of [yt+i, yt] 
we found H — 4.5846 bits and using [yt+i,yt] we found 
H = 4.0865 bits, while the true Shannon entropy is 
ff = -log 2 (l/24) =4.5850. 



Assuming a time step ahead T > 1, there are two 
scenarios to follow for the future samples of system Y: 
a single sample at time t + T, yt+T, or all the sam- 
ples in the horizon of length T, which we denote as 
yf = [yt+i, ■ ■ ■ , yt+r]- In the first case the possible rank 
orders of [yt+T, Yt] are again {m y + 1)! and for the second 
case the possible rank orders of [yf , y t ] are (m y +T)\. If 
instead we follow the form in STE and substitute yt+r to 
yt+i, we have m y \ ■ ( m ^" T )i possible ranks for [y t +T,yt], 
which regards neither of the two joint vector forms. For 
example, for m y = 3 and T = 2, using the form of STE 
[y t+ T,yt] we have 36 possible rank orders, while for the 
joint vector form with a single sample T ahead, [yt+r, yt], 
the possible rank orders are 24 and for all T samples 
ahead, [y^,yt], they are 120. For uniform white noise 
data, the true entropy for the (m + l)-dimensional aug- 
mented vector is H = 4.5850 and for (m + T)-dimensional 
augmented vector is H — 6.9069. Using the same esti- 
mation setup as before, we estimated correctly these en- 
tropies as H — 4.5847 and H = 6.9056 using the rank 
orders of [yt+T, y~t] and [yf,yt], respectively. Using the 
rank vector [yt+T,yt] we estimate H — 4.9709, which 
constitutes overestimation if considering a single future 
sample at T time steps ahead and underestimation if con- 
sidering all the samples in the T time steps ahead. 

For the future response at time T, we use the vector yj 
of all the samples in the future horizon t + 1, . . . , t + T. 
Thus we propose to substitute yt+T in STE of eq.(dl) by 
ff = [ijt+i, ■ • ■ , yt+r], the ranks of yf = [y t +i, ..., yt+r] 
in the augmented vector [y^,yt]. The proposed measure 
of transfer entropy on rank vectors (TERV) for T steps 
ahead is 

TERV^^y = (5) 

-H(yf, xt, yt) + H (x t , y t ) + H (y t T , y t ) - H(y t ). 

Since yf appears together with y t in the two entropy 
terms in eq.©, one can define [yf, yt] as the ranks of the 
augmented vector [y t +T, Z/t+T-i, ■ • ■ , 2/t+i, yt]- This is ac- 
tually equivalent of taking the ranks of y t independently 
(as they appear in the second and forth term in eg. (|5|)). 

The use of all the ranks for times t + 1, . . . ,t + T aims 
at capturing the effect of X on the evolution of the time 
series of Y up to T time steps ahead. Similar reasoning 
for T > 1 was used for other information flow measures 
[TBI HI] and we have used T > 1 also for TE in [22]. The 
TERV measure is the direct analogue to TE using ranks 
and extends the measure of information flow from X to 
Y at time t for a range of T time steps ahead t. The use 
of y^" instead of yt+T increases the dimension of the joint 
space from m x + m y + 1 to m x + m y + T and can affect 
the stability of the estimation. However, the results of 
a simulation study were in favor of yf against yt+T for 
both TE and TERV. 

Finally, we note that when a lag r y > 1 is used for 
the state space reconstruction of yt, there are up to m y \ ■ 
m y \ different rank vectors [yt+T,yt] in the computation 
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of STE. On the other hand, for TERV there are (T + 
m y )\ different rank vectors [yf ,yt\- Thus for r y > 1, 
the distortion of the domain of the rank vectors by STE 
may be large, e.g. for r y = 2 and T = 1, the pmfs and 
entropies are computed on (m y + 1)! different rank orders 
for TERV and ? 



m y \ for STE. 



4 Estimation of information mea- 
sures on simulated systems 

As it was shown for the example of uniform white noise 
the distortion of the domain of the rank vectors [yf,yt] 
using the rank vectors [yt+T,yt] instead has a direct ef- 
fect on the estimation of entropy. While for uncoupled 
systems X and Y the entropy terms involving [yt+T,yt] 
cancel out in the expression of TERV (and respectively 
for STE), in the presence of coupling some bias is intro- 
duced in the estimation of the coupling measure by STE. 
Using TERV instead this bias is removed. 

We compare the estimation of coupling (strength and 
direction) with the measures TE, STE and TERV on sim- 
ulated systems. We first standardize each time series to 
have mean zero and standard deviation one, and this al- 
lows us to define a fixed radius for all systems in the 
computation of TE, which we set r = 0.15. This choice 
is a trade-off of having enough points within a distance r 
to assure stable estimation of the point distribution and 
maintaining small neighborhoods to preserve details of 
the point distribution. Still, for high-dimensional points, 
even this radius may be insufficient to provide stable es- 
timation. 

We start with two unidirectionally coupled Henon maps 

m 



Xt+l 

yt+i 



1.4 



0.3x t 



1.4 - cx t vt + (1 - c)y 2 t + 0.3y t _i 



with coupling strengths c =0,0.05,0.1,0.15,0.2,0.3,0.4,0.5 
and 0.6. The results on the coupling measures TE, STE 
and TERV for T — I, t x — T y — I and m x = m y = 2 
are shown for 100 noise- free bivariate time series of length 
N = 1024 in Fig. [3j TE has the smallest variance and it 
seems to give the best detection of the correct direction 
of coupling even for very weak coupling, whereas STE 
performs worst. To quantify the level of discrimination of 
the correct direction of information flow, X — >• Y, often 
the net information flow is used, defined as the difference 
of the coupling measure in the two directions. Here, we 
assess the level of discrimination in a statistical setting by 
computing the area under the receiver operating charac- 
teristic (ROC) curve on the 100 coupling measure values 
for each direction, which we denote AUROC (e.g. see 
[30]). For uncoupled systems, we expect that AUROC be 
close to 0.5. For an information flow measure to detect 
coupling with great confidence AUROC has to be close 
to 1. In Fig. [HH, the AUROC shows that TE detects cou- 
pling with great confidence and obtains AUROC=l for 
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Figure 3: (a) Median (solid line) and 12.5% and 87.5% 
percentiles (dashed lines) of TE computed on 100 noise- 
free realizations of length N = 1024 from the system 
of two unidirectionally coupled Henon maps for varying 
coupling strengths. The other parameters are T = 1, 
T x = T y = 1 and m x — m y = 2. The direction X — > Y 
is shown with black lines and Y — > X with grey (online 
cyan) lines, as shown in the legend, (b) Same as (a) but 
for STE. (c) Same as (a) but for TERV. (d) AUROC 
computed on the 100 realizations for each of the two di- 
rections and for the measures TE, STE and TERV, as 
given in the legend. 



as low coupling strength as c = 0.1, followed by TERV 
reaching the same level of confidence at c = 0.15, while 
STE reaches this level only at strong coupling (c = 0.5). 

The performance of the coupling measures changes in 
the presence of noise. For the same setup as that in 
Fig. [3j but adding to the bivariate time series 20% Gaus- 
sian white noise, we observe that TERV performs best, 
followed by TE and having STE with the smallest in- 
crease in the direction X — >■ Y with the coupling strength 
(see Fig. |4j. TERV has smaller variance than TE for 
small coupling strengths, which increases its discriminat- 
ing power. Comparing with the noise-free case in Fig. [3J 
TERV does not seem to be much affected by the addition 
of noise, but TE does not have the same robustness to 
noise. It is noted that using r — 0.15 was particularly 
suitable to maintain some stability in TE on the noisy 
data. The same simulations for r = 0.1 gave much worse 
results 31 . The AUROC curves in Fig. [5H are ordered 
with TERV giving the highest and STE the lowest AU- 
ROC for all coupling strengths. 

We have estimated TE, STE and TERV on the coupled 
Henon system for different settings of embedding dimen- 
sions, time steps ahead T, time series length N and noise 
level. For N small and m y large and mostly for noisy 
time series, the computation of TE was unstable due to 
the lack of points within the given radius. This explains 
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Figure 4: As Fig. [3j but with Gaussian white noise with 
standard deviation 0.2 added to the standardized time 



series. 



that TE has smaller variance for larger time series, and 
consequently better discrimination in the two directions 
of coupling. The measures do not vary much with the 
embedding dimensions and STE shows the largest depen- 
dency, especially when m x > m y . The best results for all 
measures were obtained for m x — m y . 

In Fig. [5] the AUROC is shown for the three mea- 
sures as a function of m y , where m x = m y , for very weak 
coupling (c = 0.1), and for one and three steps ahead, 
small and large time series and for noise-free and noisy 
Henon data. It seems that estimating the information 
flow for T — 3 increases the detection of correct direction 
of weak coupling for TE and TERV, but not for STE. 
For the noise-free data, the differences in AUROC among 
the three measures are small and all measures reach the 
highest level of discrimination of the two directions when 
m y > 2. For m y = 2, AUROC^l is still reached by TE 
and TERV but only with T = 3, whereas the AUROC 
is much smaller for STE regardless of T. This pattern is 
the same for both small and large N. For noisy data, all 
measures perform worse and their AUROC shows strong 
dependence on both m y and T. Similarly to the noise- 
free case, the AUROC is larger for T = 3 than for T = 1 
for TE and TERV, but not for STE. The AUROC of TE 
decreases with m y and for m y > 2 is lower than the AU- 
ROC for both STE and TERV. TERV obtains the highest 
AUROC values with T = 3 giving the overall best results. 
The above results are consistent for the two time series 
lengths shown in Fig. [SJd and d with AUROC values in- 
creasing from N = 1024 to N = 4096. 

Similar simulations have been run for a Rossler system 
driving a Lorenz system given as (subscript 1 for Rossler, 



Figure 5: (a) AUROC computed for different m y (m x = 
m y ) on 100 realizations of the weakly coupled Henon sys- 
tem (c = 0.1) for each of the two directions, for the mea- 
sures TE, STE and TERV and for time steps ahead T, 
as given in the legend. The time series are noise-free and 
N = 1024. (b) As in (a) but for 20% additive Gaussian 
white noise, (c) and (d) are as in (a) and (b) but for 
N = 4096. 

2 for Lorenz) 
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-6(xi + 0.2j/i) 


z/2 = 28^2 - J/2 - x 2 z 2 + cy\ 
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for coupling strengths c= 0,0.5,1,1.5,2,3,4,5 [31j. The 
observed variables are x-i and j/2 and the sampling time 
is t s = O.lsec. Here, the results are more varying than 
for the Henon system. First, there is stronger dependence 
of all measures, and particularly STE and TERV, on the 
two embedding dimensions. Again the best results are 



obtained for m x — m y , while for m x 



< m y the measure 



values in the correct direction X — > Y are increased and 
for m x > m y the opposite is observed leading to erroneous 
detection of direction of interaction. The effect of the dif- 
ferent formation of the rank future vector in STE and 
TERV can be better seen for T > 1 and for small embed- 
ding dimensions. As shown in Fig. [Bl for m x = m y = 3, 
while for T = 1 both rank measures tend to give larger 
values in the opposite wrong direction Y — > X , for T = 3 
STE continues to give the same result but TERV points 
to the correct direction at least for intermediate values of 
c. 

The rank measures suffer from positive bias that in- 
creases with T and this is more obvious in TERV. When 
the two systems have different complexity the bias tends 
to be larger in the direction from the less complex 
(Rossler) to the more complex (Lorenz) system. For 
T = 3 in Fig. (SJd and d, the rank measure values are 
larger for the direction X — > Y when c = 0. This bias 
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Figure 6: (a) Median (solid line) and 12.5% and 87.5% 
percentiles (dashed lines) of each of the three cou- 
pling measures computed on 100 noise-free realizations 
of length N = 1024 from the Rossler-Lorenz system for 
varying coupling strengths. The other parameters are 
T = 1, t x = t v = 1 and m x = m y = 3. The direc- 
tion X — > Y is shown with black lines and Y — > X with 
grey (online cyan) lines, as shown in the legend, and the 
TE measure is at the top panel, STE in the middel, and 
TERV at the low panel, (b) As in (a) but for T = 3. 
(c) and (d) are as in (a) and (b) but for 20% additive 
Gaussian white noise. 



is present regardless of the coupling strength, so that the 
increase of TERV with the coupling strength still can be 
observed. However, AUROC would not give useful re- 
sults as the discrimination would be perfect with TERV 
for all c including c = 0. The positive bias of the rank 
measures causes the lack of significance, i.e. obtaining 
positive values in the absence of coupling, and this has 
serious implications in real world applications, where also 
the presence of interaction is investigated. 

The TERV measure has an advantage over TE in that 
it is more stable to noise. For the noise-free coupled 
Rossler-Lorenz system, TE detects clearly the direction 
of coupling and its performance is enhanced when T in- 
creases from 1 to 3. However, when noise is added to 
the data the estimation of TE is not stable and the large 
variance does not allow to observe different levels of TE 
in the two directions. The variance is larger for T = 3 
due to larger dimension of the state space vectors in the 
estimation of the correlation sums. 

We have made the same simulations on two weakly cou- 
pled Mackey- Glass systems given as 



0.1a* 





Ax 0.2lE t _A x 




dt 1 + xf_ Aa 


dy 

di ~ 


0.2y t -A y 0.2x t _ A:c 



O.lirt, 



(6) 



where again the driving is from the first system X to the 
second system Y [33J . The two systems can have different 
complexity determined by the delay parameters A^ and 
A a . We let each A parameter take the values 17, 30, and 
100 that, in the absence of coupling, regard systems of 
correlation dimension at about 2, 3 and 7, respectively 
[34], Thus we have 9 different coupled Mackey-Glass sys- 
tems. All systems are solved using the function dde23 of 
the computational environment MATLAB and are sam- 
pled at t s =4. 

The results are quite similar to the results of the 
Rossler-Lorenz system. There is large variation of all 
measures, and particularly the rank measures, in the de- 
tection of direction and strength of coupling, depending 
on all tested factors: noise, embedding dimensions, fu- 
ture time horizon, and system complexity. The effect of 
noise is large on TE but small on STE and TERV. Re- 
garding the embedding dimension, m x < m y tends to 
increase the measure with c more in the correct direction 
X — > Y, m x > m y tends to increase the measure in the 
opposite and false direction Y — >• X, and the best bal- 
ance is obtained for m x = m y . The rank measures take 
values at different positive levels in the two directions at 
no coupling when the two systems have different com- 
plexity. Specifically, the direction from the less to more 
complex system is the one that has the largest positive 
bias suggesting erroneously causal effect in this direction 
when c = 0. This bias may mask the difference of the 
rank measure values in the two directions for c > 0. Thus 
better results can be obtained at small embedding dimen- 
sions m x — m y and also for T > 1. 

Results for the coupled Mackey-Glass system with 
A x = A y = 30 and m x = m y — 3 are shown in Fig. [7] 
For T = 1 both rank measures do not find differences in 
the two directions, while TE takes larger values for c > 
in the correct direction X — > Y (see Fig. [7^). When 
T = 3, both rank measures increase more in the correct 
direction and give significant differences in the two direc- 
tions, as shown in Fig. [7)3. For this example, the rank 
measures obtain the same level for c = and AUROC 
can indeed illustrate the discrimination for c > 0. As 
shown in Fig. [7^, the AUROC for both rank measures is 
at the level of the TE measure only for T — 3. In the 
presence of noise, TE again tends to have larger variance 
(even larger for T = 3) and the discrimination in the two 
directions is not that clear, particularly for larger cou- 
pling strengths (c = 0.15, 0.2), as shown in Fig. |7J; and d. 
On the other hand, the rank measures perform similarly 
to the noise-free case, with TERV performing best and 
giving the largest AUROC, as shown in Fig. [7f. 

It should be noted that the overall results on the cou- 
pled Mackey-Glass systems are in favor of TE that turns 
out to be less sensitive than STE and TERV to the vari- 
ations in system complexity and embedding dimensions, 
while on the other hand it is more sensitive to the pres- 
ence of noise. 
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Figure 7: (a-d) As in Fig. [6j but for the coupled Mackey- 
Glass system with A x = A y = 30, TV = 4096 and m x = 
rriy = 3. The noise here is at the level of 10%. (e) AUROC 
computed on the measure values displayed in (a) and (b) 
for the noise- free case and T = 1 and T = 3, respectively, 
as given in the legend, (f) The same as (e) but for the 
noisy data. 



5 Discussion 

The use of ranks of consecutive samples instead of sam- 
ples themselves in the estimation of the transfer entropy 
(TE) seems to gain robustness in the presence of noise, a 
condition often met in real world applications. This was 
confirmed by our results in the simulation study. Given 
that TE based on ranks can be a useful measure of in- 
formation flow and direction of coupling, we have stud- 
ied the recently proposed rank-based transfer entropy, 
termed symbolic transfer entropy (STE), and suggested 
a modified version of STE, which we termed TE on rank 
vectors (TERV). The first modification is to use the rank 
of y t +i (one time step ahead for the response time series) 
in the augmented vector comprised of the reconstructed 
state vector at time t, y t , and yt+i, instead of consider- 
ing the whole rank vector for yt+i as done in STE. We 
showed that indeed this correction gives accurate estima- 
tion of the true entropy of the rank vector derived from 
the joint vector of y t and yt+i- Further, we suggested to 
allow the time step ahead to be T > 1 and use the ranks 
of all samples at the T future times (j/t+i, . . . , yt+T) de- 



rived from the augmented vector containing the current 
vector y t and these future samples. 

The proposed TERV measure was compared to TE and 
STE by means of simulations on some known coupled sys- 
tems, and the level of detection of the coupling direction 
was also assessed by the area under the receiver oper- 
ating characteristic curve (AUROC). We found that the 
detection of the correct coupling direction, as well as the 
correct identification of uncoupled systems and the esti- 
mation of coupling strength when present, varied across 
the three measures and depended on the presence of noise, 
the state space reconstruction (we varied both embedding 
dimensions m x and m y for the driver and the response 
system but used fixed delays t x = r y = 1), the future 
horizon T, the time series length, and the complexity of 
the systems. The results are summarized as follows. 

1. TE estimated by correlation sums has increased vari- 
ance when noise is added to the data, which may 
mask the detection of the direction of coupling. STE 
and TERV are affected much less by noise and often 
perform better than TE on noisy data. 

2. All measures are dependent on the embedding di- 
mensions and the best results are when equal span 
of information from the two systems is passed to the 
reconstructed vectors, i.e. m x = m y , a condition set 
arbitrarily, but apparently correctly as our simula- 
tions justify, in most works on coupling measures. If 
the embedding dimension for the driving system is 
larger the measure tends to be larger in the wrong 
direction of interaction. Rank measures (STE and 
TERV) tend to be more sensitive to the selection of 
the two embedding dimensions than TE. 

3. When information flow is measured by TE and 
TERV over a future horizon of length T > 1 it can 
detect better than for T = 1 the correct direction 
and strength of coupling, provided that the estima- 
tion of the entropy terms is stable. Note that using 
a larger T increases the dimension of the future re- 
sponse vector in the definition of TE and TERV and 
consequently the data requirements. Thus the sta- 
bility of the estimation depends on the length of the 
time series, the level of noise in the data and the two 
embedding dimensions. STE does not show not the 
same improvement in performance when T > 1 due 
to the way the rank future vector of the response is 
constructed. 

4. The measures using ranks (STE and TERV) have 
larger positive bias than TE that depends on embed- 
ding dimension, time series length and system com- 
plexity. For example, all measures increase with the 
embedding dimension (also when m x — m y ), so that 
even in the absence of coupling there are significantly 
larger than zero. In the simulations with increasing 
coupling strength c, the difference of the measures on 
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uncoupled and coupled systems still could be main- 
tained, but in applications where a single case of cou- 
pling is to be investigated, the lack of significance. 

5. For different complexity there is different bias in the 
two directions and the rank measures tend to differ 
at c = 0. TE shows this effect at a lesser extent. 
The largest bias is in the direction from the less to 
more complex system. When the two systems are of 
the same complexity, the bias is the same in both 
directions allowing the rank measures to detect well 
the coupling direction and the strength of coupling. 

Given the above finding, overall TERV gave better dis- 
crimination of the direction of coupling (higher AUROC) 
than STE, and when the data were noisy also better than 
TE in many cases. In particular, the use of T > 1 im- 
proved the performance of TERV and TE but not STE. 
The results on TE yield the particular estimate using 
correlation sums. A small scale simulation has showed 
that binning estimates performed worse, especially when 
the dimension increased (embedding dimension and T), 
and this is attributed to the problem of binning for high 
state space dimensions. On the other hand, the nearest 
neighbor estimate [28] was more stable, particularly on 
noisy data and high dimensions. Further investigation on 
the estimates of TE is obviously needed. 
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